Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique

نویسنده

  • Somchai Prasitjutrakul
چکیده

This paper presents an algorithm for Thai-English crosslanguage transliterated word retrieval. The algorithm enables retrieval of documents containing either the English keywords or the corresponding English-to-Thai transliterated words. This is done by retrieving documents based on phonetic codes of keywords rather than the keywords themselves. The phonetic coding is based on the Soundex coding of Odell and Russell where the encoding table is slightly modified to incorporate Thai characters and the code is extended to unlimited length. Experimental results showed that a high recall and precision of more than 80% can be achieved especially when the phonetic codes are longer than four.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Soundex-based Translation Correction in Urdu–English Cross-Language Information Retrieval

Cross-language information retrieval is difficult for languages with few processing tools or resources such as Urdu. An easy way of translating content words is provided by Google Translate, but due to lexicon limitations named entities (NEs) are transliterated letter by letter. The resulting NEs errors (zynydyny zdn for Zinedine Zidane) hurts retrieval. We propose to replace English non-words ...

متن کامل

Handling OOV Words in Indian-language - English CLIR

Because of the lack of resources Cross-lingual information retrieval is a difficult task for many Indian languages. Google Translate provides an easy way of translation from Indian languages to English but due to lexicon limitations most of the out-of-vocabulory words get transliterated letter by letter along with their suffix resulting in an unusually long string. The resulting string often do...

متن کامل

New Techniques in Thai-English Transliterated Words Searching, Applied to Our New Webservices Platform for Tourism (WICHAI)

This research proposes a new technique in Thai-English transliterated words searching by using the similar-sounding words database with similarity and relatedness calculations. The proposed search engine is used at the core of the “WICHAI” platform for tourism. The newly proposed method can help tourists, who do not speak the Thai language, to find relevant and accurate keywords on the Internet...

متن کامل

Transliterated Word Identification and Application to Query Translation Mining

Query translation mining is a key technique in cross-language information retrieval and machine translation knowledge acquisition. For better performance, the queries are classified into transliterated words and non-transliterated words based on transliterated word identification model, and are further channeled to different mining processes. This paper is a pilot study on query classification ...

متن کامل

Mandarin-English Information (MEI): investigating translingual speech retrieval

This paper describes theMandarin–English Information (MEI) project, wherewe investigated the problemof cross-language spoken document retrieval (CL-SDR), and developed one of the first English–Chinese CL-SDR systems.Our systemaccepts an entireEnglish news story (text) asquery, and retrieves relevantChinese broadcast news stories (audio) from the document collection.Hence, this is a cross-langua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998